CS4132 Data Analytics
Eve Online is a Massively Multiplayer Online Role-Playing Game, or MMORPG for short, set in space. To go out of the space stations and explore in Eve, you need to use space ships. Each of the different ships has different attributes. One of these attributes is slots. In these slots you can fit modules which increase the capabilities of your ship. Some types of modules are:
Of course, there are many more categories, and each category has many different modules.
There is obviously some strategy as to which modules you would put on your ship. It is well known, for example, that you should not bring a pickaxe to a gunfight.
By using ship data from Eve, I would be able to not only gain insight into the different strategies for fitting modules on ships, I would also be able to know how the general community acts in this way. Potentially, this could have a link to wider human psychology.
There is one main goal: To find out the most common situations where given modules are used.
As the number of modules is very large, the analysis will group modules based on what they do.
No correlation, however there are some modules rarely used.
Kind of, but many outliers
Yes.
Mostly modules that synergize, but other pairs exist.
This data contains descriptions of about 70% of all the kills that have happened in Eve. These descriptions are called "killmails". Killmails are a snapshot of the ship, its pilot, and its surroundings at a point in time - the point in time that the victim ship was destroyed. As players exploring in ships have a not low chance of getting killed, killmails cover most of the different situations in Eve. Killmails provide not only the ship and the modules on it at the time it was killed, but also the attackers that killed the ship, where the ship was at, and the pilot's alliance at the time.
This contains a lot of information about ids and attributes of things in the game. Some of the info that will be used is below.
This maps the item's id to its name and details.
This maps a flag to slots in the ship, useful to find out which items were fit on the ship and which items were carried in cargo.
This maps the solar system's id to the name of the solar system, its security status, and other attributes.
This maps the region ID to the name of the region, and in certain cases, the faction the region belongs to.
This maps the faction ID to the name of the faction, its race, home solar system and corporation ID.
This is the historical market data from 1/8/2021 onwards. Replace "regionid" with the region id and "itemid" with the item id to get results.
import pandas as pd
import json
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
import numpy as np
from scipy import stats
import plotly.express as px
sample = pd.read_csv("cut_mails.csv")
sample
| Unnamed: 0 | killmail_id | attackers | killmail_time | solar_system_id | victim.position.x | victim.position.y | victim.position.z | victim.character_id | victim.corporation_id | victim.damage_taken | victim.items | victim.ship_type_id | victim.alliance_id | victim.faction_id | moon_id | war_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 134 | 563 | [{'damage_done': 15706, 'faction_id': 500012, ... | 2007-12-06T00:24:00Z | 30001429 | NaN | NaN | NaN | 1.184150e+09 | 338591511 | 15706 | [{'flag': 87, 'item_type_id': 2444, 'quantity_... | 24700 | NaN | NaN | NaN | NaN |
| 1 | 312 | 1489 | [{'character_id': 1184117757, 'corporation_id'... | 2007-12-06T02:13:00Z | 30003286 | NaN | NaN | NaN | 4.093779e+08 | 773499566 | 436 | [] | 670 | 283331937.0 | NaN | NaN | NaN |
| 2 | 441 | 1977 | [{'alliance_id': 833571739, 'character_id': 10... | 2007-12-06T03:04:00Z | 30002098 | NaN | NaN | NaN | 9.289588e+08 | 908128976 | 4399 | [{'flag': 5, 'item_type_id': 220, 'quantity_dr... | 16240 | NaN | NaN | NaN | NaN |
| 3 | 590 | 2419 | [{'character_id': 121912466, 'corporation_id':... | 2007-12-06T03:52:00Z | 30001984 | NaN | NaN | NaN | 8.209644e+08 | 1000167 | 386 | [] | 670 | NaN | NaN | NaN | NaN |
| 4 | 730 | 2897 | [{'alliance_id': 628991027, 'character_id': 14... | 2007-12-06T04:44:00Z | 30000865 | NaN | NaN | NaN | 1.929333e+09 | 1000166 | 490 | [] | 11134 | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 477256 | 1263247 | 102609213 | [{'alliance_id': 99009764, 'character_id': 937... | 2022-08-07T09:07:31Z | 30001334 | -1.708461e+11 | 5.973561e+09 | 7.392005e+11 | 2.112485e+09 | 98659263 | 8029 | [{'flag': 13, 'item_type_id': 4405, 'quantity_... | 626 | 99009331.0 | NaN | NaN | NaN |
| 477257 | 1263382 | 102609488 | [{'damage_done': 1187, 'faction_id': 500011, '... | 2022-08-07T09:29:02Z | 30023410 | -6.046522e+11 | -8.566997e+10 | -1.734189e+12 | 1.720391e+09 | 98520878 | 1187 | [] | 3766 | 99001317.0 | NaN | NaN | NaN |
| 477258 | 1263522 | 102609796 | [{'character_id': 1192357732, 'corporation_id'... | 2022-08-07T09:50:43Z | 31001667 | -4.277093e+12 | -1.819450e+11 | 1.195963e+12 | 9.651552e+07 | 98684884 | 40756 | [{'flag': 5, 'item_type_id': 12068, 'quantity_... | 17920 | 99009116.0 | NaN | NaN | NaN |
| 477259 | 1263699 | 102610181 | [{'character_id': 96015918, 'corporation_id': ... | 2022-08-07T10:20:40Z | 30002092 | -3.808458e+12 | 3.863937e+11 | -8.717327e+12 | 2.120252e+09 | 1000179 | 451 | [] | 670 | NaN | 500003.0 | NaN | NaN |
| 477260 | 1263853 | 102610550 | [{'alliance_id': 99006371, 'character_id': 211... | 2022-08-07T10:48:27Z | 30003737 | 1.311266e+12 | 1.724271e+11 | -2.764761e+12 | 2.114119e+09 | 98512148 | 12993 | [{'flag': 5, 'item_type_id': 24490, 'quantity_... | 29990 | 99001932.0 | NaN | NaN | NaN |
477261 rows × 17 columns
attackers_sample = pd.json_normalize(json.loads(sample.loc[460228, "attackers"].replace("\'", "\"").replace("True", "1").replace("False", "0")))
attackers_sample
| alliance_id | character_id | corporation_id | damage_done | final_blow | security_status | ship_type_id | weapon_type_id | faction_id | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 99003581.0 | 2.114314e+09 | 98614214.0 | 3400 | 1 | 1.3 | 17720 | 2913.0 | NaN |
| 1 | 99003581.0 | 9.403009e+07 | 98535868.0 | 585 | 0 | 0.8 | 11999 | 11999.0 | NaN |
| 2 | NaN | NaN | NaN | 10 | 0 | 0.0 | 24150 | NaN | 500010.0 |
| 3 | 99003581.0 | 2.115479e+09 | 98598862.0 | 0 | 0 | 5.0 | 12013 | 37612.0 | NaN |
| 4 | 99003581.0 | 2.114577e+09 | 98702890.0 | 0 | 0 | 5.0 | 12003 | 3025.0 | NaN |
| 5 | 99003581.0 | 1.836479e+09 | 98535868.0 | 0 | 0 | 4.5 | 11961 | 2897.0 | NaN |
| 6 | 99003581.0 | 2.113735e+09 | 98535868.0 | 0 | 0 | 5.0 | 17718 | 3025.0 | NaN |
| 7 | 99003581.0 | 2.118955e+09 | 98538918.0 | 0 | 0 | 4.9 | 12023 | 2109.0 | NaN |
items_sample = pd.json_normalize(json.loads(sample.loc[460228, "victim.items"].replace("\'", "\"")))
items_sample
| flag | item_type_id | quantity_destroyed | singleton | quantity_dropped | |
|---|---|---|---|---|---|
| 0 | 14 | 33076 | 1.0 | 0 | NaN |
| 1 | 5 | 12559 | 3.0 | 0 | NaN |
| 2 | 5 | 12559 | NaN | 0 | 1.0 |
| 3 | 92 | 31484 | 1.0 | 0 | NaN |
| 4 | 19 | 5973 | 1.0 | 0 | NaN |
| 5 | 30 | 23071 | NaN | 0 | 1.0 |
| 6 | 15 | 47255 | NaN | 0 | 1.0 |
| 7 | 27 | 2993 | NaN | 0 | 1.0 |
| 8 | 11 | 2364 | 1.0 | 0 | NaN |
| 9 | 28 | 2993 | NaN | 0 | 1.0 |
| 10 | 5 | 23085 | 4.0 | 0 | NaN |
| 11 | 93 | 31484 | 1.0 | 0 | NaN |
| 12 | 29 | 23071 | NaN | 0 | 1.0 |
| 13 | 14 | 28668 | NaN | 0 | 2.0 |
| 14 | 20 | 5405 | 1.0 | 0 | NaN |
| 15 | 13 | 2605 | NaN | 0 | 1.0 |
| 16 | 28 | 23071 | 1.0 | 0 | NaN |
| 17 | 12 | 2364 | 1.0 | 0 | NaN |
| 18 | 5 | 28668 | 91.0 | 0 | NaN |
| 19 | 27 | 23071 | 1.0 | 0 | NaN |
| 20 | 30 | 2993 | NaN | 0 | 1.0 |
| 21 | 29 | 2993 | NaN | 0 | 1.0 |
market = pd.read_csv("market.csv").iloc[:,1:].set_index("Unnamed: 0")
market
| 2021-08-01 | 2021-08-02 | 2021-08-03 | 2021-08-05 | 2021-08-06 | 2021-08-07 | 2021-08-08 | 2021-08-09 | 2021-08-10 | 2021-08-11 | ... | 11/9/2022 | 12/9/2022 | 13/9/2022 | 14/9/2022 | 15/9/2022 | 16/9/2022 | 17/9/2022 | 18/9/2022 | 19/9/2022 | 20/9/2022 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Unnamed: 0 | |||||||||||||||||||||
| 18 | 37.85 | 27.61 | 36.00 | 36.04 | 36.17 | 36.21 | 36.24 | 39.9 | 36.33 | 36.40 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19 | 25990.00 | 20010.00 | 12538.46 | 456.00 | 456.00 | 472.00 | 473.00 | 13990.0 | 3000.00 | 1839.25 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 20 | 825.20 | 969.00 | 840.30 | 840.20 | 840.00 | 813.10 | 815.30 | 815.3 | 975.00 | 813.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 21 | 574.70 | 574.80 | 575.00 | 575.00 | 576.00 | 576.10 | 576.40 | 576.6 | 576.60 | 576.80 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 22 | 1055.00 | 1057.00 | 1062.00 | 1060.00 | 2486.00 | 1067.00 | 1068.00 | 1068.0 | 1070.00 | 1101.00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 11989 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 9.057182e+07 | 88218000.0 | 9.447667e+07 | 9.449231e+07 | 9.349214e+07 | 9.181571e+07 | 91390000.0 | 9.317692e+07 | 93298000.0 | 9.275667e+07 |
| 11993 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1.321500e+08 | 134940000.0 | 1.342534e+08 | 1.336605e+08 | 1.327386e+08 | 1.295148e+08 | 132360000.0 | 1.378000e+08 | 140858620.7 | 1.453186e+08 |
| 11995 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1.941762e+08 | 196285714.3 | 1.930211e+08 | 1.885273e+08 | 1.829000e+08 | 1.816880e+08 | 182280000.0 | 1.835571e+08 | 183306666.7 | 1.814350e+08 |
| 11999 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1.441174e+08 | 144827777.8 | 1.409345e+08 | 1.381882e+08 | 1.334342e+08 | 1.313050e+08 | 132537735.9 | 1.323583e+08 | 131612500.0 | 1.317200e+08 |
| 12003 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1.512000e+08 | 146761538.5 | 1.483000e+08 | 1.479143e+08 | 1.422059e+08 | 1.306059e+08 | 124690909.1 | 1.208824e+08 | 120463636.4 | 1.203267e+08 |
15023 rows × 832 columns
volumes = pd.read_csv("market_vol_final.csv").set_index("Unnamed: 0")
volumes
| 24700 | 16240 | 11134 | 1944 | 24698 | 12034 | 672 | 583 | 638 | 627 | ... | 8335 | 31462 | 21320 | 14027 | 11217 | 509 | 23919 | 27673 | 18694 | 40696 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Unnamed: 0 | |||||||||||||||||||||
| 10000014 | 1.076923 | 1.185185 | 20.286082 | 1.285714 | 1.285714 | 2.029412 | 15.729323 | 2.076923 | 1.000000 | 1.176471 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000016 | 1.311475 | 8.812500 | 40.180905 | 1.914729 | 3.329700 | 1.837209 | 163.959799 | 13.613065 | 1.792857 | 2.459574 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000023 | 2.526480 | 5.149701 | 30.211587 | 1.531746 | 1.973799 | 3.550769 | 35.562814 | 4.263473 | 1.075472 | 1.870813 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000030 | 2.185185 | 17.695652 | 39.221106 | 2.118812 | 3.541311 | 1.935223 | 23.690955 | 6.854167 | 1.663755 | 4.304878 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000032 | 7.670854 | 131.648241 | 39.449749 | 3.502762 | 7.300254 | 2.654275 | 47.195980 | 12.392405 | 3.048571 | 14.979899 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000033 | 1.096774 | 16.200000 | 21.376884 | 1.963636 | 1.600000 | 1.000000 | 45.183417 | 4.910256 | 1.196262 | 1.935252 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000038 | 1.833333 | 1.840000 | 7.359195 | 1.121951 | 1.750000 | 1.000000 | 2.878788 | 2.181818 | 1.000000 | 1.833333 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000042 | 2.064057 | 58.664141 | 24.778894 | 2.529221 | 3.217143 | 2.347656 | 22.324121 | 7.000000 | 1.764706 | 3.675141 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000047 | 1.000000 | 1.622222 | 12.602532 | 1.051724 | 1.352941 | 1.083333 | 7.501511 | 1.895833 | 1.000000 | 1.400000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000048 | 1.148148 | 4.804487 | 59.920308 | 1.296000 | 1.377358 | 1.382979 | 198.141058 | 3.308017 | 1.111111 | 2.760000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000060 | 14.879397 | 13.229008 | 143.298995 | 3.274052 | 5.212291 | 7.836788 | 57.962312 | 6.147541 | 2.139373 | 7.805085 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000064 | 1.874046 | 47.233668 | 15.660804 | 3.392344 | 1.508287 | 1.214286 | 25.379397 | 4.899135 | 1.208333 | 5.414758 | ... | 8.169935 | 1.0 | 1372.555556 | 1.0 | 1.236842 | 1.0 | NaN | NaN | NaN | NaN |
| 10000069 | 1.076923 | 2.938596 | 5.453172 | 1.140000 | 1.415094 | 1.095238 | 26.758794 | 2.325444 | 1.434783 | 2.240000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 |
13 rows × 4090 columns
inv = pd.read_csv("invTypes.csv").set_index("typeID")
inv
| groupID | typeName | description | mass | volume | capacity | portionSize | raceID | basePrice | published | marketGroupID | iconID | soundID | graphicID | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| typeID | ||||||||||||||
| 0 | 0 | #System | NaN | 1.0 | 0.00 | 0.0 | 1 | None | None | 0 | None | None | None | 0 |
| 2 | 2 | Corporation | NaN | 0.0 | 0.00 | 0.0 | 1 | None | None | 0 | None | None | None | 0 |
| 3 | 3 | Region | NaN | 0.0 | 1.00 | 0.0 | 1 | None | None | 0 | None | None | None | 0 |
| 4 | 4 | Constellation | NaN | 0.0 | 1.00 | 0.0 | 1 | None | None | 0 | None | None | None | 0 |
| 5 | 5 | Solar System | NaN | 0.0 | 1.00 | 0.0 | 1 | None | None | 0 | None | None | None | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 369550 | 351064 | 'Loceros' Basic-H mk.0 | A prototype MN Basic loadout with prototype we... | 0.0 | 0.01 | 0.0 | 1 | None | 48000.0000 | 0 | None | None | None | 0 |
| 370308 | 368726 | 'Deathshroud' AM-M SKIN | This SKIN only applies to Medium Amarr dropsui... | 0.0 | 0.01 | 0.0 | 1 | 4 | 3000.0000 | 0 | None | None | None | 0 |
| 370488 | 368726 | ‘Tairei’s Crimson’ AM-L SKIN | This SKIN only applies to Light Amarr dropsuit... | 0.0 | 0.01 | 0.0 | 1 | 4 | 3000.0000 | 0 | None | None | None | 0 |
| 370658 | 351844 | Council's Modified Repair Tool | By projecting a focused harmonic beam into dam... | 0.0 | 0.01 | 0.0 | 1 | None | 1125.0000 | 0 | None | None | None | 0 |
| 371027 | 350858 | X-MS16 Snowball Launcher | The Mass Driver is a semi-automatic, multi-sho... | 0.0 | 0.01 | 0.0 | 1 | 4 | 47220.0000 | 0 | None | None | None | 0 |
43050 rows × 14 columns
groups = pd.read_csv("invGroups.csv")
groups
| groupID | categoryID | groupName | iconID | useBasePrice | anchored | anchorable | fittableNonSingleton | published | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | #System | None | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | Character | None | 0 | 0 | 0 | 0 | 0 |
| 2 | 2 | 1 | Corporation | None | 0 | 0 | 0 | 0 | 0 |
| 3 | 3 | 2 | Region | None | 0 | 0 | 0 | 0 | 0 |
| 4 | 4 | 2 | Constellation | None | 0 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1451 | 367774 | 350001 | Salvage Containers | None | 0 | 0 | 0 | 0 | 0 |
| 1452 | 367776 | 350001 | Salvage Decryptors | None | 0 | 0 | 0 | 0 | 0 |
| 1453 | 368656 | 350001 | Battle Salvage | None | 1 | 0 | 0 | 0 | 0 |
| 1454 | 368666 | 350001 | Warbarge | None | 1 | 0 | 0 | 0 | 0 |
| 1455 | 368726 | 350001 | Infantry Color Skin | None | 1 | 0 | 0 | 0 | 0 |
1456 rows × 9 columns
marketgroups = pd.read_csv("invMarketGroups.csv").set_index("marketGroupID")
marketgroups
| parentGroupID | marketGroupName | description | iconID | hasTypes | |
|---|---|---|---|---|---|
| marketGroupID | |||||
| 2 | None | Blueprints & Reactions | Blueprints are data items used in industry for... | 2703 | 0 |
| 4 | None | Ships | Capsuleer spaceships of all sizes and roles, i... | 1443 | 0 |
| 5 | 1361 | Standard Frigates | Small, fast vessels suited to a variety of pur... | 1443 | 0 |
| 6 | 1367 | Standard Cruisers | The middle children of the starship industry, ... | 1443 | 0 |
| 7 | 1376 | Standard Battleships | The foundations of any respectable fighting fo... | 1443 | 0 |
| ... | ... | ... | ... | ... | ... |
| 2815 | 9 | Compressors | NaN | 25152 | 1 |
| 2816 | 209 | Compressor Blueprints | NaN | 2703 | 1 |
| 2819 | 1612 | Special Edition Electronic Attack Frigates | Electronic Attack Frigates which have been off... | 1443 | 1 |
| 2820 | 11 | Structure Area Denial Ammunition | Area denial ammunition, fired by structure def... | 1004 | 1 |
| 2821 | 211 | Structure Area Denial Ammunition | Blueprints of area denial ammunition. | 2703 | 1 |
1932 rows × 5 columns
flags = pd.read_csv("invFlags.csv")
flags
| flagID | flagName | flagText | orderID | |
|---|---|---|---|---|
| 0 | 0 | None | None | 0 |
| 1 | 1 | Wallet | Wallet | 10 |
| 2 | 2 | Offices | OfficeFolder | 0 |
| 3 | 3 | Wardrobe | Wardrobe | 0 |
| 4 | 4 | Hangar | Hangar | 30 |
| ... | ... | ... | ... | ... |
| 131 | 178 | Raffles | Raffles Hangar | 0 |
| 132 | 179 | FrigateEscapeBay | Frigate escape bay Hangar | 0 |
| 133 | 180 | StructureDeedBay | Structure Deed Bay | 0 |
| 134 | 181 | SpecializedIceHold | Specialized Ice Hold | 0 |
| 135 | 182 | SpecializedAsteroidHold | Specialized Asteroid Hold | 0 |
136 rows × 4 columns
solarsystems = pd.read_csv("mapSolarSystems.csv")
solarsystems
| regionID | constellationID | solarSystemID | solarSystemName | x | y | z | xMin | xMax | yMin | ... | corridor | hub | international | regional | constellation | security | factionID | radius | sunTypeID | securityClass | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10000001 | 20000001 | 30000001 | Tanoo | -8.851079e+16 | 4.236944e+16 | -4.451353e+16 | -8.851190e+16 | -8.850926e+16 | 4.236930e+16 | ... | 0 | 1 | 1 | 1 | None | 0.858324 | 500007 | 1.323338e+12 | 45041 | B |
| 1 | 10000001 | 20000001 | 30000002 | Lashesih | -1.033010e+17 | 4.170750e+16 | -2.985630e+16 | -1.033016e+17 | -1.032995e+17 | 4.170747e+16 | ... | 1 | 0 | 1 | 1 | None | 0.751689 | 500007 | 1.018400e+12 | 45037 | B |
| 2 | 10000001 | 20000001 | 30000003 | Akpivem | -9.117414e+16 | 4.393823e+16 | -5.648282e+16 | -9.117829e+16 | -9.117334e+16 | 4.393819e+16 | ... | 0 | 1 | 0 | 0 | None | 0.846292 | 500007 | 2.473362e+12 | 3799 | B |
| 3 | 10000001 | 20000001 | 30000004 | Jark | -9.367593e+16 | 5.060424e+16 | -2.840353e+16 | -9.367738e+16 | -9.367549e+16 | 5.060420e+16 | ... | 1 | 0 | 1 | 1 | None | 0.817001 | 500007 | 1.771412e+12 | 45030 | B |
| 4 | 10000001 | 20000001 | 30000005 | Sasta | -9.478216e+16 | 4.312625e+16 | -3.189671e+16 | -9.478287e+16 | -9.477774e+16 | 4.312619e+16 | ... | 0 | 1 | 0 | 0 | None | 0.814337 | 500007 | 2.563946e+12 | 45040 | B |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 8480 | 14000005 | 24000025 | 34000196 | V-196 | -3.742976e+18 | 2.252058e+18 | -6.137890e+18 | -3.742991e+18 | -3.742961e+18 | 2.252043e+18 | ... | 0 | 0 | 0 | 0 | None | -0.990000 | None | 1.495979e+13 | None | None |
| 8481 | 14000005 | 24000025 | 34000197 | V-197 | -3.762613e+18 | 2.317148e+18 | -6.127626e+18 | -3.762628e+18 | -3.762598e+18 | 2.317133e+18 | ... | 0 | 0 | 0 | 0 | None | -0.990000 | None | 1.495979e+13 | None | None |
| 8482 | 14000005 | 24000025 | 34000198 | V-198 | -3.726805e+18 | 2.273820e+18 | -6.118384e+18 | -3.726820e+18 | -3.726790e+18 | 2.273805e+18 | ... | 0 | 0 | 0 | 0 | None | -0.990000 | None | 1.495979e+13 | None | None |
| 8483 | 14000005 | 24000025 | 34000199 | V-199 | -3.702467e+18 | 2.271227e+18 | -6.075477e+18 | -3.702482e+18 | -3.702452e+18 | 2.271212e+18 | ... | 0 | 0 | 0 | 0 | None | -0.990000 | None | 1.495979e+13 | None | None |
| 8484 | 14000005 | 24000025 | 34000200 | V-200 | -3.726768e+18 | 2.248087e+18 | -6.097488e+18 | -3.726783e+18 | -3.726753e+18 | 2.248072e+18 | ... | 0 | 0 | 0 | 0 | None | -0.990000 | None | 1.495979e+13 | None | None |
8485 rows × 26 columns
regi = pd.read_csv("mapRegions.csv").set_index("regionID")
regi
| regionName | x | y | z | xMin | xMax | yMin | yMax | zMin | zMax | factionID | nebula | radius | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| regionID | |||||||||||||
| 10000001 | Derelik | -7.736195e+16 | 5.087803e+16 | -6.443310e+16 | -1.055500e+17 | -4.917392e+16 | 2.712855e+16 | 7.462751e+16 | 2.642336e+16 | 1.024428e+17 | 500007 | 11799 | None |
| 10000002 | The Forge | -9.642033e+16 | 6.402708e+16 | 1.125398e+17 | -1.436457e+17 | -4.919500e+16 | 3.515456e+16 | 9.289960e+16 | -1.444526e+17 | -8.062703e+16 | 500001 | 11806 | None |
| 10000003 | Vale of the Silent | -4.406932e+16 | 9.472944e+16 | 1.813847e+17 | -9.923376e+16 | 1.109511e+16 | 5.820417e+16 | 1.312547e+17 | -2.188796e+17 | -1.438898e+17 | None | 11814 | None |
| 10000004 | UUA-F4 | 8.986800e+16 | 5.478010e+16 | 2.725758e+17 | 6.739083e+16 | 1.123452e+17 | 1.386504e+16 | 9.569515e+16 | -3.807742e+17 | -1.643773e+17 | None | 11817 | None |
| 10000005 | Detorid | 1.335404e+17 | -3.139150e+16 | -1.963923e+17 | 5.808592e+16 | 2.089949e+17 | -5.072033e+16 | -1.206267e+16 | 1.647489e+17 | 2.280357e+17 | None | 11849 | None |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 14000001 | VR-01 | -3.900972e+18 | 2.574945e+18 | -8.266928e+18 | -4.050972e+18 | -3.750972e+18 | 2.424945e+18 | 2.724945e+18 | -8.416928e+18 | -8.116928e+18 | None | 11821 | None |
| 14000002 | VR-02 | -3.731107e+18 | 3.112926e+18 | -8.155502e+18 | -3.881107e+18 | -3.581107e+18 | 2.962926e+18 | 3.262926e+18 | -8.305502e+18 | -8.005502e+18 | None | 11821 | None |
| 14000003 | VR-03 | -5.431842e+18 | 2.985429e+18 | -6.018316e+18 | -5.581842e+18 | -5.281842e+18 | 2.835429e+18 | 3.135429e+18 | -6.168316e+18 | -5.868316e+18 | None | 11821 | None |
| 14000004 | VR-04 | -4.545299e+18 | 2.308091e+18 | -6.316707e+18 | -4.695299e+18 | -4.395299e+18 | 2.158091e+18 | 2.458091e+18 | -6.466707e+18 | -6.166707e+18 | None | 11821 | None |
| 14000005 | VR-05 | -3.876324e+18 | 2.174764e+18 | -5.975813e+18 | -4.026324e+18 | -3.726324e+18 | 2.024764e+18 | 2.324764e+18 | -6.125813e+18 | -5.825813e+18 | None | 11821 | None |
112 rows × 13 columns
factions = pd.read_csv("chrFactions.csv")
factions
| factionID | factionName | description | raceIDs | solarSystemID | corporationID | sizeFactor | stationCount | stationSystemCount | militiaCorporationID | iconID | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 500001 | Caldari State | The Caldari State is ruled by several mega-cor... | 1 | 30000145 | 1000035 | 5.0 | None | None | 1000180 | 1439 |
| 1 | 500002 | Minmatar Republic | The Minmatar Republic was formed over a centur... | 2 | 30002544 | 1000051 | 5.0 | None | None | 1000182 | 1440 |
| 2 | 500003 | Amarr Empire | The largest of the five main empires, the Amar... | 4 | 30002187 | 1000084 | 5.0 | None | None | 1000179 | 1442 |
| 3 | 500004 | Gallente Federation | The Gallente Federation encompasses several ra... | 8 | 30004993 | 1000120 | 5.0 | None | None | 1000181 | 1441 |
| 4 | 500005 | Jove Empire | The Jove Empire is isolated from the rest of t... | 16 | 30001642 | 1000149 | 5.0 | None | None | None | 2195 |
| 5 | 500006 | CONCORD Assembly | CONCORD is an independent organization founded... | 1 | 30005204 | 1000137 | 5.0 | None | None | None | 1434 |
| 6 | 500007 | Ammatar Mandate | The Ammatars are part of the Amarr Empire, but... | 2 | 30000001 | 1000123 | 4.0 | None | None | None | 10172 |
| 7 | 500008 | Khanid Kingdom | The Khanid Kingdom, also known as the Dark Ama... | 4 | 30003863 | 1000156 | 4.0 | None | None | None | 10173 |
| 8 | 500009 | The Syndicate | Formed by Intaki exiles from the Gallente Fede... | 8 | 30003271 | 1000146 | 4.0 | None | None | None | 1437 |
| 9 | 500010 | Guristas Pirates | Formed by two former members of the Caldari Na... | 1 | 30001290 | 1000127 | 4.0 | None | None | None | 1630 |
| 10 | 500011 | Angel Cartel | Operating from the heart of the Curse region, ... | 1 | 30001045 | 1000138 | 4.0 | None | None | None | 10174 |
| 11 | 500012 | Blood Raider Covenant | The Amarr Empire has had its share of religiou... | 4 | 30003088 | 1000134 | 3.0 | None | None | None | 1441 |
| 12 | 500013 | The InterBus | The InterBus is one of the more successful joi... | 1 | 30005203 | 1000148 | 3.0 | None | None | None | 96 |
| 13 | 500014 | ORE | Outer Ring Excavations, or ORE, is the largest... | 8 | 30004504 | 1000129 | 3.0 | None | None | None | 1720 |
| 14 | 500015 | Thukker Tribe | The Thukker tribe is one of the seven original... | 2 | 30000905 | 1000163 | 3.0 | None | None | None | 10175 |
| 15 | 500016 | Servant Sisters of EVE | The Sisters of EVE are mainly known for their ... | 1 | 30001978 | 1000130 | 3.0 | None | None | None | 1004 |
| 16 | 500017 | The Society of Conscious Thought | The Society of Conscious Thought is three cent... | 16 | 30002423 | 1000131 | 3.0 | None | None | None | 10176 |
| 17 | 500018 | Mordu's Legion Command | The origin of Mordu's Legion lies in the Galle... | 1 | 30002005 | 1000128 | 3.0 | None | None | None | 1722 |
| 18 | 500019 | Sansha's Nation | Sansha's Nation was founded more than a centur... | 1 | 30001868 | 1000162 | 4.0 | None | None | None | 10177 |
| 19 | 500020 | Serpentis | The Serpentis Corporation was founded a few de... | 1 | 30004623 | 1000135 | 4.0 | None | None | None | 10178 |
| 20 | 500021 | Unknown | Unknown | 1 | 30005286 | None | 0.0 | None | None | None | 0 |
| 21 | 500024 | Drifters | Emerging from the ruins of the Sleeper civiliz... | 16 | 30005286 | 1000274 | 0.0 | None | None | None | 21404 |
| 22 | 500025 | Rogue Drones | While rogues drones come in all shapes, sizes ... | 134 | 30005286 | 1000287 | 0.0 | None | None | None | 20996 |
| 23 | 500026 | Triglavian Collective | The Triglavian Collective appears to be a huma... | 135 | 30005286 | 1000298 | 5.0 | None | None | None | 20996 |
| 24 | 500027 | EDENCOM | EDENCOM is the New Eden Common Defense Initiat... | 1 | 30005204 | 1000297 | 5.0 | None | None | None | 24419 |
| 25 | 500028 | Association for Interdisciplinary Research | The Association for Interdisciplinary Research... | 4 | 30005305 | 1000413 | 5.0 | None | None | None | 21 |
Dataset used: https://data.everef.net/killmails
When the killmails are downloaded, they are in text files with json objects scattered in them. They are extracted and placed into dataframes. The columns are standardized for easy formatting. They are then saved and removed from memory when they are too large to fit (~5GB)
The csv files are combined into one and then a systematic sample (1 row every ~150 rows) is taken out. This is where sample comes from.
The market data comes from the website in json format. Each entry is stored as a row, with the dates having no data being turned into NaN. The average is taken out along with the date, and the index of the row is the item id.
This time the volume is taken instead of the average, and the mean of all the days is taken to filter out NaN data.
A function is made to take the json of items out of the column. The items are then filtered out based on the 'flag' and then the prices are extracted from the market data. If the price is NaN, it takes the average price. Same for the ship. The information is then stored in a dataframe.
Similarly to the pricing data, the attackers and items are extracted from their jsons. Based on how the attackers are, the items are saved into a dataframe.
For more information on the code, check the appendix. In the interest of time, none of the code will be placed here.
As the numbers are whole numbers, having many 0s would bias the dataset. So, we limit the data to modules with over 500 total occurences, and regions which have more than 10,000 modules recorded.
In the interest of time, calculation of the number of items found in each region throughout the killmail data has already been done and stored in region_items.csv.
regions = pd.read_csv("region_items.csv")
regions = regions.groupby("Region").sum()
regions = regions.T[regions.sum()>500].T[regions.sum(axis=1)>100000]
regions2 = regions.copy()
regions = regions / regions.sum()
print(regions.index.values.tolist())
for ind in regions.index.values:
plt.figure(figsize=(20,8))
bot15 = regions.loc[ind].sort_values(ascending=True)[0:]
x = pd.to_numeric(bot15.index.values).tolist()
plot = px.bar(x = inv.loc[x, "typeName"], y=bot15.values)
plot.update_layout(title_text = (regi.loc[ind, "regionName"]), yaxis_title = "Relative Frequency", xaxis_title = "Module")
plot.show()
regions
[10000002, 10000014, 10000016, 10000023, 10000030, 10000032, 10000033, 10000038, 10000042, 10000047, 10000048, 10000060, 10000064, 10000069]
| 12773 | 31179 | 2913 | 3841 | 519 | 8089 | 1999 | 27387 | 5975 | 3244 | ... | 33474 | 33816 | 34317 | 34562 | 34828 | 35683 | 42685 | 12198 | 47466 | 49710 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Region | |||||||||||||||||||||
| 10000002 | 0.080018 | 0.115866 | 0.074108 | 0.086198 | 0.188332 | 0.074222 | 0.083310 | 0.039119 | 0.095040 | 0.054379 | ... | 0.255172 | 0.036458 | 0.027426 | 0.054492 | 0.055901 | 0.050754 | 0.156951 | 0.000000 | 0.135659 | 0.041152 |
| 10000014 | 0.126062 | 0.107516 | 0.113890 | 0.162075 | 0.088424 | 0.152851 | 0.136867 | 0.180389 | 0.127495 | 0.112199 | ... | 0.068966 | 0.067708 | 0.132911 | 0.142857 | 0.114907 | 0.091907 | 0.031390 | 0.267442 | 0.046512 | 0.045267 |
| 10000016 | 0.032633 | 0.043841 | 0.041828 | 0.051538 | 0.035217 | 0.040808 | 0.033075 | 0.027332 | 0.037844 | 0.026594 | ... | 0.113793 | 0.026042 | 0.025316 | 0.030928 | 0.045031 | 0.028807 | 0.067265 | 0.000000 | 0.100775 | 0.061728 |
| 10000023 | 0.055878 | 0.070981 | 0.054103 | 0.100914 | 0.053969 | 0.089436 | 0.062137 | 0.139563 | 0.060747 | 0.061127 | ... | 0.055172 | 0.093750 | 0.086498 | 0.123711 | 0.093168 | 0.057613 | 0.040359 | 0.322674 | 0.069767 | 0.082305 |
| 10000030 | 0.077783 | 0.051148 | 0.063878 | 0.036056 | 0.059965 | 0.046922 | 0.060753 | 0.034677 | 0.049479 | 0.058150 | ... | 0.051724 | 0.046875 | 0.050633 | 0.042710 | 0.018634 | 0.042524 | 0.049327 | 0.000000 | 0.027132 | 0.028807 |
| 10000032 | 0.026375 | 0.039666 | 0.030689 | 0.034173 | 0.025765 | 0.041945 | 0.032937 | 0.010079 | 0.037232 | 0.025073 | ... | 0.031034 | 0.031250 | 0.021097 | 0.020619 | 0.017081 | 0.045267 | 0.098655 | 0.000000 | 0.096899 | 0.028807 |
| 10000033 | 0.060796 | 0.080376 | 0.063424 | 0.060813 | 0.094573 | 0.076923 | 0.067672 | 0.068671 | 0.078138 | 0.088714 | ... | 0.062069 | 0.140625 | 0.086498 | 0.079529 | 0.085404 | 0.085048 | 0.139013 | 0.000000 | 0.089147 | 0.131687 |
| 10000038 | 0.080018 | 0.057411 | 0.086383 | 0.035079 | 0.063319 | 0.045073 | 0.068364 | 0.037752 | 0.065891 | 0.079518 | ... | 0.044828 | 0.062500 | 0.105485 | 0.070692 | 0.034161 | 0.028807 | 0.053812 | 0.000000 | 0.034884 | 0.069959 |
| 10000042 | 0.073759 | 0.037578 | 0.089111 | 0.043308 | 0.074042 | 0.051614 | 0.065735 | 0.029723 | 0.057440 | 0.058018 | ... | 0.058621 | 0.062500 | 0.048523 | 0.033873 | 0.029503 | 0.057613 | 0.067265 | 0.000000 | 0.081395 | 0.074074 |
| 10000047 | 0.094323 | 0.092902 | 0.089793 | 0.092963 | 0.055900 | 0.065264 | 0.083310 | 0.063375 | 0.088059 | 0.072374 | ... | 0.041379 | 0.041667 | 0.105485 | 0.091311 | 0.080745 | 0.071331 | 0.022422 | 0.200581 | 0.050388 | 0.045267 |
| 10000048 | 0.087170 | 0.061587 | 0.070925 | 0.052793 | 0.048379 | 0.047775 | 0.060338 | 0.047831 | 0.071647 | 0.074226 | ... | 0.062069 | 0.065104 | 0.061181 | 0.038292 | 0.068323 | 0.069959 | 0.085202 | 0.000000 | 0.046512 | 0.069959 |
| 10000060 | 0.083594 | 0.093946 | 0.082519 | 0.132854 | 0.068554 | 0.112043 | 0.096457 | 0.200205 | 0.070545 | 0.077335 | ... | 0.068966 | 0.057292 | 0.061181 | 0.076583 | 0.243789 | 0.167353 | 0.026906 | 0.209302 | 0.143411 | 0.152263 |
| 10000064 | 0.024139 | 0.048017 | 0.048193 | 0.033336 | 0.036030 | 0.035831 | 0.043039 | 0.021182 | 0.053644 | 0.051535 | ... | 0.027586 | 0.057292 | 0.037975 | 0.038292 | 0.021739 | 0.057613 | 0.058296 | 0.000000 | 0.038760 | 0.053498 |
| 10000069 | 0.097452 | 0.099165 | 0.091157 | 0.077899 | 0.107531 | 0.119295 | 0.106006 | 0.100102 | 0.106797 | 0.160757 | ... | 0.058621 | 0.210938 | 0.149789 | 0.156112 | 0.091615 | 0.145405 | 0.103139 | 0.000000 | 0.038760 | 0.115226 |
14 rows × 1045 columns
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
regions2 = regions2.loc[volumes.index.values]
regions2 = regions2 / regions2.sum()
regions2
| 12773 | 31179 | 2913 | 3841 | 519 | 8089 | 1999 | 27387 | 5975 | 3244 | ... | 33474 | 33816 | 34317 | 34562 | 34828 | 35683 | 42685 | 12198 | 47466 | 49710 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Region | |||||||||||||||||||||
| 10000014 | 0.137026 | 0.121606 | 0.123005 | 0.177364 | 0.108941 | 0.165105 | 0.149306 | 0.187733 | 0.140885 | 0.118651 | ... | 0.092593 | 0.070270 | 0.136659 | 0.151090 | 0.121711 | 0.096821 | 0.037234 | 0.267442 | 0.053812 | 0.047210 |
| 10000016 | 0.035471 | 0.049587 | 0.045176 | 0.056399 | 0.043388 | 0.044079 | 0.036081 | 0.028444 | 0.041819 | 0.028124 | ... | 0.152778 | 0.027027 | 0.026030 | 0.032710 | 0.047697 | 0.030347 | 0.079787 | 0.000000 | 0.116592 | 0.064378 |
| 10000023 | 0.060739 | 0.080283 | 0.058434 | 0.110433 | 0.066491 | 0.096606 | 0.067784 | 0.145244 | 0.067127 | 0.064643 | ... | 0.074074 | 0.097297 | 0.088937 | 0.130841 | 0.098684 | 0.060694 | 0.047872 | 0.322674 | 0.080717 | 0.085837 |
| 10000030 | 0.084548 | 0.057851 | 0.068991 | 0.039457 | 0.073879 | 0.050683 | 0.066274 | 0.036089 | 0.054676 | 0.061494 | ... | 0.069444 | 0.048649 | 0.052061 | 0.045171 | 0.019737 | 0.044798 | 0.058511 | 0.000000 | 0.031390 | 0.030043 |
| 10000032 | 0.028669 | 0.044864 | 0.033145 | 0.037396 | 0.031743 | 0.045308 | 0.035930 | 0.010489 | 0.041142 | 0.026515 | ... | 0.041667 | 0.032432 | 0.021692 | 0.021807 | 0.018092 | 0.047688 | 0.117021 | 0.000000 | 0.112108 | 0.030043 |
| 10000033 | 0.066084 | 0.090909 | 0.068500 | 0.066550 | 0.116516 | 0.083090 | 0.073822 | 0.071467 | 0.086345 | 0.093816 | ... | 0.083333 | 0.145946 | 0.088937 | 0.084112 | 0.090461 | 0.089595 | 0.164894 | 0.000000 | 0.103139 | 0.137339 |
| 10000038 | 0.086978 | 0.064935 | 0.093297 | 0.038388 | 0.078012 | 0.048687 | 0.074577 | 0.039289 | 0.072811 | 0.084091 | ... | 0.060185 | 0.064865 | 0.108460 | 0.074766 | 0.036184 | 0.030347 | 0.063830 | 0.000000 | 0.040359 | 0.072961 |
| 10000042 | 0.080175 | 0.042503 | 0.096244 | 0.047394 | 0.091222 | 0.055752 | 0.071709 | 0.030933 | 0.063473 | 0.061354 | ... | 0.078704 | 0.064865 | 0.049892 | 0.035826 | 0.031250 | 0.060694 | 0.079787 | 0.000000 | 0.094170 | 0.077253 |
| 10000047 | 0.102527 | 0.105077 | 0.096980 | 0.101732 | 0.068871 | 0.070496 | 0.090882 | 0.065956 | 0.097307 | 0.076536 | ... | 0.055556 | 0.043243 | 0.108460 | 0.096573 | 0.085526 | 0.075145 | 0.026596 | 0.200581 | 0.058296 | 0.047210 |
| 10000048 | 0.094752 | 0.069658 | 0.076602 | 0.057773 | 0.059604 | 0.051605 | 0.065821 | 0.049778 | 0.079172 | 0.078494 | ... | 0.083333 | 0.067568 | 0.062907 | 0.040498 | 0.072368 | 0.073699 | 0.101064 | 0.000000 | 0.053812 | 0.072961 |
| 10000060 | 0.090865 | 0.106257 | 0.089123 | 0.145387 | 0.084460 | 0.121026 | 0.105223 | 0.208356 | 0.077954 | 0.081783 | ... | 0.092593 | 0.059459 | 0.062907 | 0.080997 | 0.258224 | 0.176301 | 0.031915 | 0.209302 | 0.165919 | 0.158798 |
| 10000064 | 0.026239 | 0.054309 | 0.052050 | 0.036480 | 0.044390 | 0.038704 | 0.046950 | 0.022044 | 0.059277 | 0.054498 | ... | 0.037037 | 0.059459 | 0.039046 | 0.040498 | 0.023026 | 0.060694 | 0.069149 | 0.000000 | 0.044843 | 0.055794 |
| 10000069 | 0.105928 | 0.112161 | 0.098453 | 0.085248 | 0.132482 | 0.128859 | 0.115640 | 0.104178 | 0.118013 | 0.170001 | ... | 0.078704 | 0.218919 | 0.154013 | 0.165109 | 0.097039 | 0.153179 | 0.122340 | 0.000000 | 0.044843 | 0.120172 |
13 rows × 1045 columns
volumes = volumes / volumes.sum()
volumes
| 24700 | 16240 | 11134 | 1944 | 24698 | 12034 | 672 | 583 | 638 | 627 | ... | 8335 | 31462 | 21320 | 14027 | 11217 | 509 | 23919 | 27673 | 18694 | 40696 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Unnamed: 0 | |||||||||||||||||||||
| 10000014 | 0.027097 | 0.003811 | 0.044119 | 0.049218 | 0.036878 | 0.070059 | 0.023397 | 0.028899 | 0.051453 | 0.022688 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000016 | 0.032998 | 0.028334 | 0.087388 | 0.073298 | 0.095506 | 0.063424 | 0.243890 | 0.189417 | 0.092248 | 0.047432 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000023 | 0.063569 | 0.016557 | 0.065706 | 0.058637 | 0.056614 | 0.122579 | 0.052900 | 0.059324 | 0.055336 | 0.036078 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000030 | 0.054982 | 0.056895 | 0.085300 | 0.081110 | 0.101575 | 0.066807 | 0.035240 | 0.095372 | 0.085605 | 0.083017 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000032 | 0.193009 | 0.423274 | 0.085798 | 0.134089 | 0.209393 | 0.091630 | 0.070204 | 0.172433 | 0.156858 | 0.288879 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000033 | 0.027596 | 0.052086 | 0.046492 | 0.075170 | 0.045893 | 0.034522 | 0.067210 | 0.068323 | 0.061551 | 0.037320 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000038 | 0.046129 | 0.005916 | 0.016005 | 0.042949 | 0.050195 | 0.034522 | 0.004282 | 0.030359 | 0.051453 | 0.035355 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000042 | 0.051934 | 0.188616 | 0.053891 | 0.096821 | 0.092277 | 0.081045 | 0.033207 | 0.097401 | 0.090799 | 0.070873 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000047 | 0.025161 | 0.005216 | 0.027409 | 0.040261 | 0.038806 | 0.037399 | 0.011159 | 0.026379 | 0.051453 | 0.026998 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000048 | 0.028889 | 0.015447 | 0.130318 | 0.049612 | 0.039507 | 0.047743 | 0.294735 | 0.046029 | 0.057170 | 0.053225 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000060 | 0.374385 | 0.042534 | 0.311655 | 0.125334 | 0.149504 | 0.270540 | 0.086219 | 0.085539 | 0.110077 | 0.150517 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 10000064 | 0.047153 | 0.151865 | 0.034060 | 0.129862 | 0.043262 | 0.041919 | 0.037752 | 0.068168 | 0.062172 | 0.104421 | ... | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN |
| 10000069 | 0.027097 | 0.009448 | 0.011860 | 0.043640 | 0.040589 | 0.037810 | 0.039804 | 0.032357 | 0.073824 | 0.043197 | ... | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 |
13 rows × 4090 columns
for ind in regions2.index.values:
plt.figure(figsize=(10,4))
bot15 = regions2.loc[ind].sort_values(ascending=True)[0:25]
x2 = bot15.index.values.tolist()
x = pd.to_numeric(bot15.index.values).tolist()
plot = sns.barplot(x = inv.loc[x, "typeName"], y=bot15.values, alpha=0.5, color='#FF0000')
volums = []
if (ind != 10000002):
volum = volumes.loc[ind]
for i in x2:
try:
volums.append(volum[i])
except:
volums.append(np.NaN)
sns.barplot(x = inv.loc[x, "typeName"], y=volums, alpha=0.5, color='#0000FF')
plt.title(regi.loc[ind, "regionName"])
plt.xlabel("25 least used modules (relatiely)")
plt.ylabel("Relative Frequency")
plot.set_xticklabels(plot.get_xticklabels(), rotation=30, ha="right")
plt.show()
regions
| 12773 | 31179 | 2913 | 3841 | 519 | 8089 | 1999 | 27387 | 5975 | 3244 | ... | 33474 | 33816 | 34317 | 34562 | 34828 | 35683 | 42685 | 12198 | 47466 | 49710 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Region | |||||||||||||||||||||
| 10000002 | 0.080018 | 0.115866 | 0.074108 | 0.086198 | 0.188332 | 0.074222 | 0.083310 | 0.039119 | 0.095040 | 0.054379 | ... | 0.255172 | 0.036458 | 0.027426 | 0.054492 | 0.055901 | 0.050754 | 0.156951 | 0.000000 | 0.135659 | 0.041152 |
| 10000014 | 0.126062 | 0.107516 | 0.113890 | 0.162075 | 0.088424 | 0.152851 | 0.136867 | 0.180389 | 0.127495 | 0.112199 | ... | 0.068966 | 0.067708 | 0.132911 | 0.142857 | 0.114907 | 0.091907 | 0.031390 | 0.267442 | 0.046512 | 0.045267 |
| 10000016 | 0.032633 | 0.043841 | 0.041828 | 0.051538 | 0.035217 | 0.040808 | 0.033075 | 0.027332 | 0.037844 | 0.026594 | ... | 0.113793 | 0.026042 | 0.025316 | 0.030928 | 0.045031 | 0.028807 | 0.067265 | 0.000000 | 0.100775 | 0.061728 |
| 10000023 | 0.055878 | 0.070981 | 0.054103 | 0.100914 | 0.053969 | 0.089436 | 0.062137 | 0.139563 | 0.060747 | 0.061127 | ... | 0.055172 | 0.093750 | 0.086498 | 0.123711 | 0.093168 | 0.057613 | 0.040359 | 0.322674 | 0.069767 | 0.082305 |
| 10000030 | 0.077783 | 0.051148 | 0.063878 | 0.036056 | 0.059965 | 0.046922 | 0.060753 | 0.034677 | 0.049479 | 0.058150 | ... | 0.051724 | 0.046875 | 0.050633 | 0.042710 | 0.018634 | 0.042524 | 0.049327 | 0.000000 | 0.027132 | 0.028807 |
| 10000032 | 0.026375 | 0.039666 | 0.030689 | 0.034173 | 0.025765 | 0.041945 | 0.032937 | 0.010079 | 0.037232 | 0.025073 | ... | 0.031034 | 0.031250 | 0.021097 | 0.020619 | 0.017081 | 0.045267 | 0.098655 | 0.000000 | 0.096899 | 0.028807 |
| 10000033 | 0.060796 | 0.080376 | 0.063424 | 0.060813 | 0.094573 | 0.076923 | 0.067672 | 0.068671 | 0.078138 | 0.088714 | ... | 0.062069 | 0.140625 | 0.086498 | 0.079529 | 0.085404 | 0.085048 | 0.139013 | 0.000000 | 0.089147 | 0.131687 |
| 10000038 | 0.080018 | 0.057411 | 0.086383 | 0.035079 | 0.063319 | 0.045073 | 0.068364 | 0.037752 | 0.065891 | 0.079518 | ... | 0.044828 | 0.062500 | 0.105485 | 0.070692 | 0.034161 | 0.028807 | 0.053812 | 0.000000 | 0.034884 | 0.069959 |
| 10000042 | 0.073759 | 0.037578 | 0.089111 | 0.043308 | 0.074042 | 0.051614 | 0.065735 | 0.029723 | 0.057440 | 0.058018 | ... | 0.058621 | 0.062500 | 0.048523 | 0.033873 | 0.029503 | 0.057613 | 0.067265 | 0.000000 | 0.081395 | 0.074074 |
| 10000047 | 0.094323 | 0.092902 | 0.089793 | 0.092963 | 0.055900 | 0.065264 | 0.083310 | 0.063375 | 0.088059 | 0.072374 | ... | 0.041379 | 0.041667 | 0.105485 | 0.091311 | 0.080745 | 0.071331 | 0.022422 | 0.200581 | 0.050388 | 0.045267 |
| 10000048 | 0.087170 | 0.061587 | 0.070925 | 0.052793 | 0.048379 | 0.047775 | 0.060338 | 0.047831 | 0.071647 | 0.074226 | ... | 0.062069 | 0.065104 | 0.061181 | 0.038292 | 0.068323 | 0.069959 | 0.085202 | 0.000000 | 0.046512 | 0.069959 |
| 10000060 | 0.083594 | 0.093946 | 0.082519 | 0.132854 | 0.068554 | 0.112043 | 0.096457 | 0.200205 | 0.070545 | 0.077335 | ... | 0.068966 | 0.057292 | 0.061181 | 0.076583 | 0.243789 | 0.167353 | 0.026906 | 0.209302 | 0.143411 | 0.152263 |
| 10000064 | 0.024139 | 0.048017 | 0.048193 | 0.033336 | 0.036030 | 0.035831 | 0.043039 | 0.021182 | 0.053644 | 0.051535 | ... | 0.027586 | 0.057292 | 0.037975 | 0.038292 | 0.021739 | 0.057613 | 0.058296 | 0.000000 | 0.038760 | 0.053498 |
| 10000069 | 0.097452 | 0.099165 | 0.091157 | 0.077899 | 0.107531 | 0.119295 | 0.106006 | 0.100102 | 0.106797 | 0.160757 | ... | 0.058621 | 0.210938 | 0.149789 | 0.156112 | 0.091615 | 0.145405 | 0.103139 | 0.000000 | 0.038760 | 0.115226 |
14 rows × 1045 columns
for ind in regions2.index.values:
plt.figure(figsize=(15,6))
bot15 = regions2.loc[ind].sort_values(ascending=True)[-45:]
x2 = bot15.index.values.tolist()
x = pd.to_numeric(bot15.index.values).tolist()
plot = sns.barplot(x = inv.loc[x, "typeName"], y=bot15.values, alpha=0.5, color='#FF0000')
volums = []
if (ind != 10000002):
volum = volumes.loc[ind]
for i in x2:
try:
volums.append(volum[i])
except:
volums.append(np.NaN)
sns.barplot(x = inv.loc[x, "typeName"], y=volums, alpha=0.5, color='#0000FF')
plt.title(regi.loc[ind, "regionName"])
plt.xlabel("45 least used modules (relatiely)")
plt.ylabel("Relative Frequency")
plot.set_xticklabels(plot.get_xticklabels(), rotation=30, ha="right")
plt.show()
regions
| 12773 | 31179 | 2913 | 3841 | 519 | 8089 | 1999 | 27387 | 5975 | 3244 | ... | 33474 | 33816 | 34317 | 34562 | 34828 | 35683 | 42685 | 12198 | 47466 | 49710 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Region | |||||||||||||||||||||
| 10000002 | 0.080018 | 0.115866 | 0.074108 | 0.086198 | 0.188332 | 0.074222 | 0.083310 | 0.039119 | 0.095040 | 0.054379 | ... | 0.255172 | 0.036458 | 0.027426 | 0.054492 | 0.055901 | 0.050754 | 0.156951 | 0.000000 | 0.135659 | 0.041152 |
| 10000014 | 0.126062 | 0.107516 | 0.113890 | 0.162075 | 0.088424 | 0.152851 | 0.136867 | 0.180389 | 0.127495 | 0.112199 | ... | 0.068966 | 0.067708 | 0.132911 | 0.142857 | 0.114907 | 0.091907 | 0.031390 | 0.267442 | 0.046512 | 0.045267 |
| 10000016 | 0.032633 | 0.043841 | 0.041828 | 0.051538 | 0.035217 | 0.040808 | 0.033075 | 0.027332 | 0.037844 | 0.026594 | ... | 0.113793 | 0.026042 | 0.025316 | 0.030928 | 0.045031 | 0.028807 | 0.067265 | 0.000000 | 0.100775 | 0.061728 |
| 10000023 | 0.055878 | 0.070981 | 0.054103 | 0.100914 | 0.053969 | 0.089436 | 0.062137 | 0.139563 | 0.060747 | 0.061127 | ... | 0.055172 | 0.093750 | 0.086498 | 0.123711 | 0.093168 | 0.057613 | 0.040359 | 0.322674 | 0.069767 | 0.082305 |
| 10000030 | 0.077783 | 0.051148 | 0.063878 | 0.036056 | 0.059965 | 0.046922 | 0.060753 | 0.034677 | 0.049479 | 0.058150 | ... | 0.051724 | 0.046875 | 0.050633 | 0.042710 | 0.018634 | 0.042524 | 0.049327 | 0.000000 | 0.027132 | 0.028807 |
| 10000032 | 0.026375 | 0.039666 | 0.030689 | 0.034173 | 0.025765 | 0.041945 | 0.032937 | 0.010079 | 0.037232 | 0.025073 | ... | 0.031034 | 0.031250 | 0.021097 | 0.020619 | 0.017081 | 0.045267 | 0.098655 | 0.000000 | 0.096899 | 0.028807 |
| 10000033 | 0.060796 | 0.080376 | 0.063424 | 0.060813 | 0.094573 | 0.076923 | 0.067672 | 0.068671 | 0.078138 | 0.088714 | ... | 0.062069 | 0.140625 | 0.086498 | 0.079529 | 0.085404 | 0.085048 | 0.139013 | 0.000000 | 0.089147 | 0.131687 |
| 10000038 | 0.080018 | 0.057411 | 0.086383 | 0.035079 | 0.063319 | 0.045073 | 0.068364 | 0.037752 | 0.065891 | 0.079518 | ... | 0.044828 | 0.062500 | 0.105485 | 0.070692 | 0.034161 | 0.028807 | 0.053812 | 0.000000 | 0.034884 | 0.069959 |
| 10000042 | 0.073759 | 0.037578 | 0.089111 | 0.043308 | 0.074042 | 0.051614 | 0.065735 | 0.029723 | 0.057440 | 0.058018 | ... | 0.058621 | 0.062500 | 0.048523 | 0.033873 | 0.029503 | 0.057613 | 0.067265 | 0.000000 | 0.081395 | 0.074074 |
| 10000047 | 0.094323 | 0.092902 | 0.089793 | 0.092963 | 0.055900 | 0.065264 | 0.083310 | 0.063375 | 0.088059 | 0.072374 | ... | 0.041379 | 0.041667 | 0.105485 | 0.091311 | 0.080745 | 0.071331 | 0.022422 | 0.200581 | 0.050388 | 0.045267 |
| 10000048 | 0.087170 | 0.061587 | 0.070925 | 0.052793 | 0.048379 | 0.047775 | 0.060338 | 0.047831 | 0.071647 | 0.074226 | ... | 0.062069 | 0.065104 | 0.061181 | 0.038292 | 0.068323 | 0.069959 | 0.085202 | 0.000000 | 0.046512 | 0.069959 |
| 10000060 | 0.083594 | 0.093946 | 0.082519 | 0.132854 | 0.068554 | 0.112043 | 0.096457 | 0.200205 | 0.070545 | 0.077335 | ... | 0.068966 | 0.057292 | 0.061181 | 0.076583 | 0.243789 | 0.167353 | 0.026906 | 0.209302 | 0.143411 | 0.152263 |
| 10000064 | 0.024139 | 0.048017 | 0.048193 | 0.033336 | 0.036030 | 0.035831 | 0.043039 | 0.021182 | 0.053644 | 0.051535 | ... | 0.027586 | 0.057292 | 0.037975 | 0.038292 | 0.021739 | 0.057613 | 0.058296 | 0.000000 | 0.038760 | 0.053498 |
| 10000069 | 0.097452 | 0.099165 | 0.091157 | 0.077899 | 0.107531 | 0.119295 | 0.106006 | 0.100102 | 0.106797 | 0.160757 | ... | 0.058621 | 0.210938 | 0.149789 | 0.156112 | 0.091615 | 0.145405 | 0.103139 | 0.000000 | 0.038760 | 0.115226 |
14 rows × 1045 columns
for ind in regions2.index.values:
plt.figure(figsize=(15,6))
bot15 = regions2.loc[ind].sort_values(ascending=True)
x2 = bot15.index.values.tolist()
x = pd.to_numeric(bot15.index.values).tolist()
plot = sns.barplot(x = inv.loc[x, "typeName"], y=bot15.values, alpha=0.5, color='#FF0000')
volums = []
if (ind != 10000002):
volum = volumes.loc[ind]
for i in x2:
try:
volums.append(volum[i])
except:
volums.append(np.NaN)
sns.barplot(x = inv.loc[x, "typeName"], y=volums, alpha=0.5, color='#0000FF')
plt.title(regi.loc[ind, "regionName"])
plt.xlabel("45 least used modules (relatiely)")
plt.ylabel("Relative Frequency")
plot.set_xticklabels(plot.get_xticklabels(), rotation=30, ha="right")
plt.show()
regions
| 12773 | 31179 | 2913 | 3841 | 519 | 8089 | 1999 | 27387 | 5975 | 3244 | ... | 33474 | 33816 | 34317 | 34562 | 34828 | 35683 | 42685 | 12198 | 47466 | 49710 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Region | |||||||||||||||||||||
| 10000002 | 0.080018 | 0.115866 | 0.074108 | 0.086198 | 0.188332 | 0.074222 | 0.083310 | 0.039119 | 0.095040 | 0.054379 | ... | 0.255172 | 0.036458 | 0.027426 | 0.054492 | 0.055901 | 0.050754 | 0.156951 | 0.000000 | 0.135659 | 0.041152 |
| 10000014 | 0.126062 | 0.107516 | 0.113890 | 0.162075 | 0.088424 | 0.152851 | 0.136867 | 0.180389 | 0.127495 | 0.112199 | ... | 0.068966 | 0.067708 | 0.132911 | 0.142857 | 0.114907 | 0.091907 | 0.031390 | 0.267442 | 0.046512 | 0.045267 |
| 10000016 | 0.032633 | 0.043841 | 0.041828 | 0.051538 | 0.035217 | 0.040808 | 0.033075 | 0.027332 | 0.037844 | 0.026594 | ... | 0.113793 | 0.026042 | 0.025316 | 0.030928 | 0.045031 | 0.028807 | 0.067265 | 0.000000 | 0.100775 | 0.061728 |
| 10000023 | 0.055878 | 0.070981 | 0.054103 | 0.100914 | 0.053969 | 0.089436 | 0.062137 | 0.139563 | 0.060747 | 0.061127 | ... | 0.055172 | 0.093750 | 0.086498 | 0.123711 | 0.093168 | 0.057613 | 0.040359 | 0.322674 | 0.069767 | 0.082305 |
| 10000030 | 0.077783 | 0.051148 | 0.063878 | 0.036056 | 0.059965 | 0.046922 | 0.060753 | 0.034677 | 0.049479 | 0.058150 | ... | 0.051724 | 0.046875 | 0.050633 | 0.042710 | 0.018634 | 0.042524 | 0.049327 | 0.000000 | 0.027132 | 0.028807 |
| 10000032 | 0.026375 | 0.039666 | 0.030689 | 0.034173 | 0.025765 | 0.041945 | 0.032937 | 0.010079 | 0.037232 | 0.025073 | ... | 0.031034 | 0.031250 | 0.021097 | 0.020619 | 0.017081 | 0.045267 | 0.098655 | 0.000000 | 0.096899 | 0.028807 |
| 10000033 | 0.060796 | 0.080376 | 0.063424 | 0.060813 | 0.094573 | 0.076923 | 0.067672 | 0.068671 | 0.078138 | 0.088714 | ... | 0.062069 | 0.140625 | 0.086498 | 0.079529 | 0.085404 | 0.085048 | 0.139013 | 0.000000 | 0.089147 | 0.131687 |
| 10000038 | 0.080018 | 0.057411 | 0.086383 | 0.035079 | 0.063319 | 0.045073 | 0.068364 | 0.037752 | 0.065891 | 0.079518 | ... | 0.044828 | 0.062500 | 0.105485 | 0.070692 | 0.034161 | 0.028807 | 0.053812 | 0.000000 | 0.034884 | 0.069959 |
| 10000042 | 0.073759 | 0.037578 | 0.089111 | 0.043308 | 0.074042 | 0.051614 | 0.065735 | 0.029723 | 0.057440 | 0.058018 | ... | 0.058621 | 0.062500 | 0.048523 | 0.033873 | 0.029503 | 0.057613 | 0.067265 | 0.000000 | 0.081395 | 0.074074 |
| 10000047 | 0.094323 | 0.092902 | 0.089793 | 0.092963 | 0.055900 | 0.065264 | 0.083310 | 0.063375 | 0.088059 | 0.072374 | ... | 0.041379 | 0.041667 | 0.105485 | 0.091311 | 0.080745 | 0.071331 | 0.022422 | 0.200581 | 0.050388 | 0.045267 |
| 10000048 | 0.087170 | 0.061587 | 0.070925 | 0.052793 | 0.048379 | 0.047775 | 0.060338 | 0.047831 | 0.071647 | 0.074226 | ... | 0.062069 | 0.065104 | 0.061181 | 0.038292 | 0.068323 | 0.069959 | 0.085202 | 0.000000 | 0.046512 | 0.069959 |
| 10000060 | 0.083594 | 0.093946 | 0.082519 | 0.132854 | 0.068554 | 0.112043 | 0.096457 | 0.200205 | 0.070545 | 0.077335 | ... | 0.068966 | 0.057292 | 0.061181 | 0.076583 | 0.243789 | 0.167353 | 0.026906 | 0.209302 | 0.143411 | 0.152263 |
| 10000064 | 0.024139 | 0.048017 | 0.048193 | 0.033336 | 0.036030 | 0.035831 | 0.043039 | 0.021182 | 0.053644 | 0.051535 | ... | 0.027586 | 0.057292 | 0.037975 | 0.038292 | 0.021739 | 0.057613 | 0.058296 | 0.000000 | 0.038760 | 0.053498 |
| 10000069 | 0.097452 | 0.099165 | 0.091157 | 0.077899 | 0.107531 | 0.119295 | 0.106006 | 0.100102 | 0.106797 | 0.160757 | ... | 0.058621 | 0.210938 | 0.149789 | 0.156112 | 0.091615 | 0.145405 | 0.103139 | 0.000000 | 0.038760 | 0.115226 |
14 rows × 1045 columns
In the interest of time, calculation of the ship price and the price of items on the ship has already been done for each ship in the killmail dataset, and stored in shipprice.csv.
shipprice = pd.read_csv("shipprice.csv")
shipprice = shipprice.dropna()
shipprice
| Unnamed: 0.1 | Unnamed: 0 | ship_id | ship_price | module_price | |
|---|---|---|---|---|---|
| 1 | 1 | 0 | 47269 | 2.558519e+07 | 3.350072e+07 |
| 3 | 3 | 0 | 24700 | 5.292508e+07 | 1.426485e+07 |
| 5 | 5 | 0 | 35832 | 7.095789e+08 | 8.227371e+08 |
| 7 | 7 | 0 | 596 | 1.857917e+04 | 1.103100e+02 |
| 8 | 8 | 0 | 17636 | 6.217667e+08 | 3.960204e+08 |
| ... | ... | ... | ... | ... | ... |
| 3482989 | 925756 | 0 | 28659 | 1.052254e+09 | 1.066701e+09 |
| 3482990 | 925757 | 0 | 17720 | 2.626386e+08 | 2.122895e+07 |
| 3482991 | 925758 | 0 | 606 | 4.834000e+02 | 1.110000e+02 |
| 3482996 | 925763 | 0 | 33474 | 1.789000e+06 | 0.000000e+00 |
| 3482997 | 925764 | 0 | 33475 | 7.175000e+06 | 0.000000e+00 |
1934633 rows × 5 columns
Plotly is not used here as it freezes the notebook.
sns.scatterplot(x=shipprice["ship_price"], y=shipprice["module_price"])
<AxesSubplot: xlabel='ship_price', ylabel='module_price'>
As we can tell, it is impossible to see anything. In addition, the few outliers make the graph too dense on the bottom left side. As the prices of ships in Eve can range from around hundreds of thousands to apparently tens of billions of currency, it may be a good idea to take the logarithm. Because we take the logarithm, we should remove all prices with a value of 0.
This way, the logarithm exists, and anyways, a module price of 0 suggests that no modules were on the ship, and we do not want to consider that as it will skew the data.
shipprice = shipprice[shipprice["module_price"] > 0]
sns.scatterplot(x=np.log(shipprice["ship_price"]), y=np.log(shipprice["module_price"]))
<AxesSubplot: xlabel='ship_price', ylabel='module_price'>
While the logarithm removes the issue of the outliers being too far out, it is still impossible to accurately see a correlation. Many of the 1.9 million points are hidden behind other points. So, we will make the graph bigger and the points smaller, so we can see the density of the points.
plt.figure(figsize=(50,40))
sns.scatterplot(x=np.log(shipprice["ship_price"]), y=np.log(shipprice["module_price"]),s =1)
<AxesSubplot: xlabel='ship_price', ylabel='module_price'>
With this, we can see that in the middle to right side of the graph, an increase in ship price is met with an increase in the module price. However, in the left side of the graph, the points appear to be approximately randomly scattered.
The graph has a great many outliers. We can try plotting the hue as the ship type, so maybe certain ship types deviate from the rule a bit.
shipprice["group"] = marketgroups["marketGroupName"][pd.to_numeric(marketgroups["parentGroupID"][pd.to_numeric(inv["marketGroupID"][shipprice["ship_id"]])].tolist())].tolist()
shipprice
C:\Users\User\AppData\Local\Temp\ipykernel_28540\2225122281.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
| Unnamed: 0.1 | Unnamed: 0 | ship_id | ship_price | module_price | group | |
|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 47269 | 2.558519e+07 | 3.350072e+07 | Precursor Frigates |
| 3 | 3 | 0 | 24700 | 5.292508e+07 | 1.426485e+07 | Standard Battlecruisers |
| 5 | 5 | 0 | 35832 | 7.095789e+08 | 8.227371e+08 | Citadels |
| 7 | 7 | 0 | 596 | 1.857917e+04 | 1.103100e+02 | Corvettes |
| 8 | 8 | 0 | 17636 | 6.217667e+08 | 3.960204e+08 | Faction Battleships |
| ... | ... | ... | ... | ... | ... | ... |
| 3482985 | 925752 | 0 | 17480 | 3.899279e+07 | 5.674360e+06 | Mining Barges |
| 3482987 | 925754 | 0 | 16242 | 1.098000e+06 | 4.493609e+06 | Standard Destroyers |
| 3482989 | 925756 | 0 | 28659 | 1.052254e+09 | 1.066701e+09 | Marauders |
| 3482990 | 925757 | 0 | 17720 | 2.626386e+08 | 2.122895e+07 | Faction Cruisers |
| 3482991 | 925758 | 0 | 606 | 4.834000e+02 | 1.110000e+02 | Corvettes |
1569024 rows × 6 columns
plt.figure(figsize=(50,40))
sns.scatterplot(x=np.log(shipprice["ship_price"]), y=np.log(shipprice["module_price"]), s=5, hue=shipprice["group"])
<AxesSubplot: xlabel='ship_price', ylabel='module_price'>
We see that most of the categories of ships approximately follow that rule. However, there are too many categories to analyze. So, we further group up the categories, and only take the basic combat ships which comprise most of the data.
fshipprice = shipprice.copy()
fshipprice["group"] = fshipprice["group"].str.split()
fshipprice = fshipprice[fshipprice["group"].str[0].isin(["Precursor", "Faction", "Advanced", "Standard", "Freighters"])]
def filter(x):
stuff = x["group"]
if "Freighters" in (x["group"]):
x["group"] = "Freighters"
else:
x["group"] = stuff[0]
return x
fshipprice = fshipprice.apply(filter, axis=1)
fshipprice
| Unnamed: 0.1 | Unnamed: 0 | ship_id | ship_price | module_price | group | |
|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 47269 | 2.558519e+07 | 3.350072e+07 | Precursor |
| 3 | 3 | 0 | 24700 | 5.292508e+07 | 1.426485e+07 | Standard |
| 8 | 8 | 0 | 17636 | 6.217667e+08 | 3.960204e+08 | Faction |
| 13 | 13 | 0 | 602 | 4.966000e+05 | 1.563368e+07 | Standard |
| 14 | 14 | 0 | 626 | 1.041000e+07 | 7.912130e+06 | Standard |
| ... | ... | ... | ... | ... | ... | ... |
| 3482975 | 925742 | 0 | 24698 | 4.879308e+07 | 1.457094e+07 | Standard |
| 3482983 | 925750 | 0 | 589 | 6.000000e+05 | 4.806029e+05 | Standard |
| 3482984 | 925751 | 0 | 4310 | 6.622831e+07 | 5.104411e+07 | Standard |
| 3482987 | 925754 | 0 | 16242 | 1.098000e+06 | 4.493609e+06 | Standard |
| 3482990 | 925757 | 0 | 17720 | 2.626386e+08 | 2.122895e+07 | Faction |
1066229 rows × 6 columns
plt.figure(figsize=(50,40))
sns.scatterplot(x=np.log(fshipprice["ship_price"]), y=np.log(fshipprice["module_price"]), hue=fshipprice["group"],s =1)
<AxesSubplot: xlabel='ship_price', ylabel='module_price'>
.
pearson_coef, p_value = stats.pearsonr(np.log(shipprice["ship_price"]), np.log(shipprice["module_price"]))
pearson_coef, p_value
(0.8872665778833148, 0.0)
Based on the pearson coefficient and the p-value, we are confident there is indeed a strong correlation between the logarithm of the price of the ship and the logarithm of the price of the modules on it. However, this means that when taking out the logarithm, the predicted values will be a few times off.
In the interest of time, the number of each module recorded in pvp or pve contexts has already been calculated and stored in pnp.csv.
pnp = pd.read_csv("pnp.csv").set_index("Unnamed: 0")
pnp
| P | N | PN | |
|---|---|---|---|
| Unnamed: 0 | |||
| 24700 | 1211.0 | 217.0 | 579.0 |
| 670 | 109636.0 | 3186.0 | 1006.0 |
| 16240 | 2634.0 | 5740.0 | 2397.0 |
| 11134 | 1563.0 | 341.0 | 36.0 |
| 1944 | 1012.0 | 78.0 | 22.0 |
| ... | ... | ... | ... |
| 62628 | 1.0 | 0.0 | 0.0 |
| 20985 | 0.0 | 0.0 | 8.0 |
| 61213 | 2.0 | 0.0 | 0.0 |
| 31332 | 0.0 | 0.0 | 1.0 |
| 62632 | 1.0 | 0.0 | 0.0 |
4961 rows × 3 columns
As in question 1, we should filter out data with very low numbers of occurences. We also take out the column PN as it is not used here.
pnp = pnp[pnp.sum(axis=1) > 100][["P","N"]]
pnp
| P | N | |
|---|---|---|
| Unnamed: 0 | ||
| 24700 | 1211.0 | 217.0 |
| 670 | 109636.0 | 3186.0 |
| 16240 | 2634.0 | 5740.0 |
| 11134 | 1563.0 | 341.0 |
| 1944 | 1012.0 | 78.0 |
| ... | ... | ... |
| 53343 | 94.0 | 44.0 |
| 54291 | 495.0 | 147.0 |
| 58919 | 177.0 | 36.0 |
| 58972 | 136.0 | 36.0 |
| 58966 | 77.0 | 17.0 |
1998 rows × 2 columns
As we are only interested in comparing the relative proportions of modules used in these contexts, we will divide accordingly.
pnp = pnp / pnp.sum()
pnp = (pnp.T / pnp.sum(axis=1)).T
pnp = pnp.sort_values(by="P", ascending=True)
pnp
| P | N | |
|---|---|---|
| Unnamed: 0 | ||
| 2003 | 0.008755 | 0.991245 |
| 7937 | 0.027439 | 0.972561 |
| 31502 | 0.052395 | 0.947605 |
| 20795 | 0.052818 | 0.947182 |
| 31312 | 0.058529 | 0.941471 |
| ... | ... | ... |
| 21926 | 1.000000 | 0.000000 |
| 20060 | 1.000000 | 0.000000 |
| 37532 | 1.000000 | 0.000000 |
| 28351 | 1.000000 | 0.000000 |
| 20064 | 1.000000 | 0.000000 |
1998 rows × 2 columns
Now we plot the proportion of modules found in pvp contexts.
plt.figure(figsize=(200,100))
sns.barplot(x = inv["typeName"][pnp.index.values], y=pnp["P"])
<AxesSubplot: xlabel='typeName', ylabel='P'>
plt.figure(figsize=(20,10))
sns.kdeplot(data=pnp["P"])
<AxesSubplot: xlabel='P', ylabel='Density'>
From the graph, we can deduce that there are approximately equal numbers of modules used more in pvp contexts as ones used more in pve contexts. The graph is also approximately linear, so the distribution of the popularity of items is approximately uniform. However, the graph curves at the ends, so there are comparatively few modules that are almost exclusively used for fighting player or non player enemies.
Zooming on the top right side of the graph:
plt.figure(figsize = (20,10))
plot = sns.barplot(x = inv["typeName"][pnp[pnp["P"] == 1].index.values], y=pnp[pnp["P"] == 1]["P"])
plot.set_xticklabels(plot.get_xticklabels(), rotation=30, ha="right")
plt.show()
Zooming in on the bottom left side of the graph:
plt.figure(figsize = (20,10))
plot = sns.barplot(x = inv["typeName"][pnp[pnp["P"] <= 0.15].index.values], y=pnp[pnp["P"] <= 0.15]["P"])
plot.set_xticklabels(plot.get_xticklabels(), rotation=30, ha="right")
plt.show()
Upon further investigation, the items that have been found only in pvp situations are mostly structures and items on structures. This makes sense, as non-player enemies do not go around the solar system invading structures and killing them, but players do.
In the interest of time, the number of occurences of each pair of modules (duplicates not counted) has already been calculated and stored in pairing.csv.
pairs = pd.read_csv("pairing.csv").set_index("0")
pairs
| 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | ... | 62590 | 62591 | 62622 | 62625 | 62628 | 62631 | 62632 | 62636 | 63140 | 63165 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | |||||||||||||||||||||
| 178 | 0.0 | 12.0 | 9.0 | 10.0 | 16.0 | 7.0 | 11.0 | 30.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 179 | 12.0 | 0.0 | 6.0 | 10.0 | 12.0 | 9.0 | 11.0 | 19.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 180 | 9.0 | 6.0 | 0.0 | 11.0 | 13.0 | 8.0 | 10.0 | 15.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 181 | 10.0 | 10.0 | 11.0 | 0.0 | 11.0 | 5.0 | 11.0 | 17.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 182 | 16.0 | 12.0 | 13.0 | 11.0 | 0.0 | 12.0 | 16.0 | 34.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 62631 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 62632 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 62636 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 63140 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 |
| 63165 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4162 rows × 4162 columns
Now we can try to plot a heatmap to see which pairs are most commonly seen. Trust me, I have done it and with the >4000 modules it is impossible to see anything. Check "heatmap.png" to see what it looks like. It cannot be displayed here without lagging the notebook.
So instead, we will plot the item groups instead of every single item against every other item.
Once again, in the interest of time, the number of occurences of groups of modules has already been calculated and stored in region_pairing.csv.
regionpairs = pd.read_csv("region_pairing.csv").set_index('group')
regionpairs
| 102 | 103 | 105 | 106 | 107 | 108 | 109 | 112 | 113 | 116 | ... | 2738 | 2740 | 2742 | 2743 | 2744 | 2783 | 2795 | 2804 | 2805 | 2815 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| group | |||||||||||||||||||||
| 102 | 1344.0 | 31.0 | 9.0 | 0.0 | 140.0 | 15.0 | 0.0 | 4.0 | 71.0 | 3.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 103 | 31.0 | 228.0 | 10.0 | 0.0 | 7.0 | 41.0 | 0.0 | 18.0 | 3.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 105 | 9.0 | 10.0 | 124.0 | 3.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 106 | 0.0 | 0.0 | 3.0 | 20.0 | 5.0 | 10.0 | 5.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 107 | 140.0 | 7.0 | 0.0 | 5.0 | 1078.0 | 74.0 | 0.0 | 6.0 | 130.0 | 12.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2783 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2795 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2804 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 0.0 | 0.0 |
| 2805 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 |
| 2815 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 |
400 rows × 400 columns
We want relatively accurate data, so module groups with too few occurences will be excluded.
regionpairs = regionpairs.loc[:, regionpairs.sum() > 200]
regionpairs = regionpairs.loc[pd.to_numeric(regionpairs.columns.values.tolist())]
regionpairs
| 102 | 103 | 105 | 106 | 107 | 108 | 109 | 112 | 113 | 116 | ... | 2467 | 2468 | 2469 | 2470 | 2471 | 2509 | 2529 | 2783 | 2804 | 2805 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| group | |||||||||||||||||||||
| 102 | 1344.0 | 31.0 | 9.0 | 0.0 | 140.0 | 15.0 | 0.0 | 4.0 | 71.0 | 3.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 103 | 31.0 | 228.0 | 10.0 | 0.0 | 7.0 | 41.0 | 0.0 | 18.0 | 3.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 105 | 9.0 | 10.0 | 124.0 | 3.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 106 | 0.0 | 0.0 | 3.0 | 20.0 | 5.0 | 10.0 | 5.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 107 | 140.0 | 7.0 | 0.0 | 5.0 | 1078.0 | 74.0 | 0.0 | 6.0 | 130.0 | 12.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2509 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2529 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2783 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2804 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 0.0 |
| 2805 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 |
350 rows × 350 columns
Now we can plot the heatmap.
normalized = (regionpairs / regionpairs.sum()).T
plt.figure(figsize=(75,70))
sns.heatmap(np.log(np.log(normalized+1)+1), cmap='turbo')
plt.show()
The values on each row add up to 1. So in each row, the brightest points in that row are the modules that are most commonly fit alongside it. The columns are not normalized in the same way, so brighter columns are modules that appear a lot more in the dataset.
grouppair2 = pd.read_csv("region_pairing2.csv").set_index("ngroup")
grouppair2
| 9 | 10 | 11 | 14 | 52 | 114 | 115 | 117 | 118 | 120 | ... | 2227 | 2297 | 2340 | 2432 | 2463 | 2464 | 2527 | 2729 | 2730 | 2741 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ngroup | |||||||||||||||||||||
| 9 | 56188.0 | 0.0 | 21344.0 | 24177.0 | 42742.0 | 9.0 | 232.0 | 3899.0 | 383.0 | 3377.0 | ... | 0.0 | 1012.0 | 0.0 | 31.0 | 15.0 | 16.0 | 5.0 | 0.0 | 0.0 | 0.0 |
| 10 | 0.0 | 0.0 | 3409.0 | 346.0 | 3028.0 | 0.0 | 1.0 | 5.0 | 6.0 | 71.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 11 | 21344.0 | 3409.0 | 35472.0 | 49863.0 | 78048.0 | 574.0 | 145.0 | 10330.0 | 4802.0 | 4230.0 | ... | 4.0 | 2003.0 | 3.0 | 1176.0 | 475.0 | 701.0 | 27.0 | 1.0 | 0.0 | 1.0 |
| 14 | 24177.0 | 346.0 | 49863.0 | 800.0 | 111828.0 | 393.0 | 150.0 | 10255.0 | 7564.0 | 8634.0 | ... | 0.0 | 3208.0 | 0.0 | 1237.0 | 486.0 | 740.0 | 25.0 | 5.0 | 2.0 | 7.0 |
| 52 | 42742.0 | 3028.0 | 78048.0 | 111828.0 | 16710.0 | 988.0 | 353.0 | 22386.0 | 12954.0 | 23317.0 | ... | 0.0 | 4907.0 | 0.0 | 1684.0 | 644.0 | 1029.0 | 35.0 | 7.0 | 12.0 | 19.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2464 | 16.0 | 0.0 | 701.0 | 740.0 | 1029.0 | 0.0 | 0.0 | 0.0 | 0.0 | 38.0 | ... | 0.0 | 144.0 | 0.0 | 995.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2527 | 5.0 | 0.0 | 27.0 | 25.0 | 35.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2729 | 0.0 | 0.0 | 1.0 | 5.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 |
| 2730 | 0.0 | 0.0 | 0.0 | 2.0 | 12.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 12.0 |
| 2741 | 0.0 | 0.0 | 1.0 | 7.0 | 19.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 12.0 | 0.0 |
98 rows × 98 columns
grouppair2 = grouppair2.loc[:, grouppair2.sum() > 200]
grouppair2 = grouppair2.loc[pd.to_numeric(grouppair2.columns.values.tolist())]
grouppair2
| 9 | 10 | 11 | 14 | 52 | 114 | 115 | 117 | 118 | 120 | ... | 2208 | 2209 | 2226 | 2227 | 2297 | 2340 | 2432 | 2463 | 2464 | 2527 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ngroup | |||||||||||||||||||||
| 9 | 56188.0 | 0.0 | 21344.0 | 24177.0 | 42742.0 | 9.0 | 232.0 | 3899.0 | 383.0 | 3377.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 1012.0 | 0.0 | 31.0 | 15.0 | 16.0 | 5.0 |
| 10 | 0.0 | 0.0 | 3409.0 | 346.0 | 3028.0 | 0.0 | 1.0 | 5.0 | 6.0 | 71.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 11 | 21344.0 | 3409.0 | 35472.0 | 49863.0 | 78048.0 | 574.0 | 145.0 | 10330.0 | 4802.0 | 4230.0 | ... | 6.0 | 0.0 | 5.0 | 4.0 | 2003.0 | 3.0 | 1176.0 | 475.0 | 701.0 | 27.0 |
| 14 | 24177.0 | 346.0 | 49863.0 | 800.0 | 111828.0 | 393.0 | 150.0 | 10255.0 | 7564.0 | 8634.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 3208.0 | 0.0 | 1237.0 | 486.0 | 740.0 | 25.0 |
| 52 | 42742.0 | 3028.0 | 78048.0 | 111828.0 | 16710.0 | 988.0 | 353.0 | 22386.0 | 12954.0 | 23317.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 4907.0 | 0.0 | 1684.0 | 644.0 | 1029.0 | 35.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2340 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 93.0 | 19.0 | 76.0 | 52.0 | 0.0 | 188.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2432 | 31.0 | 0.0 | 1176.0 | 1237.0 | 1684.0 | 0.0 | 0.0 | 0.0 | 0.0 | 62.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 229.0 | 0.0 | 0.0 | 627.0 | 995.0 | 0.0 |
| 2463 | 15.0 | 0.0 | 475.0 | 486.0 | 644.0 | 0.0 | 0.0 | 0.0 | 0.0 | 24.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 85.0 | 0.0 | 627.0 | 0.0 | 0.0 | 0.0 |
| 2464 | 16.0 | 0.0 | 701.0 | 740.0 | 1029.0 | 0.0 | 0.0 | 0.0 | 0.0 | 38.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 144.0 | 0.0 | 995.0 | 0.0 | 0.0 | 0.0 |
| 2527 | 5.0 | 0.0 | 27.0 | 25.0 | 35.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
94 rows × 94 columns
normalized = (grouppair2 / grouppair2.sum()).T
normalized.columns = marketgroups["marketGroupName"][pd.to_numeric(normalized.columns.values).tolist()]
normalized.index = marketgroups["marketGroupName"][pd.to_numeric(normalized.index.values).tolist()]
normalized.index.name = "row"
normalized.columns.name='column'
fig = px.imshow(np.log(np.log(normalized+1)+1), text_auto = True, width=600, height=500)
fig.update_layout().update_yaxes(automargin=False).update_xaxes(automargin=False)
fig.show()
#plt.show()
for ind in regions.index.values:
plt.figure(figsize=(20,8))
bot15 = regions.loc[ind].sort_values(ascending=True)[0:]
x = pd.to_numeric(bot15.index.values).tolist()
plot = px.bar(x = inv.loc[x, "typeName"], y=bot15.values)
plot.update_layout(title_text = (regi.loc[ind, "regionName"]), yaxis_title = "Relative Frequency", xaxis_title = "Module")
plot.show()
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
<Figure size 1440x576 with 0 Axes>
The above graph shows the distribution of how many times each module appears in the killmails for each region. It is approximately linear in the middle, but curves towards the extremes at the left and right end. This trend appears for all the modules.
for ind in regions2.index.values:
plt.figure(figsize=(15,6))
bot15 = regions2.loc[ind].sort_values(ascending=True)
x2 = bot15.index.values.tolist()
x = pd.to_numeric(bot15.index.values).tolist()
plot = sns.barplot(x = inv.loc[x, "typeName"], y=bot15.values, alpha=0.5, color='#FF0000')
volums = []
if (ind != 10000002):
volum = volumes.loc[ind]
for i in x2:
try:
volums.append(volum[i])
except:
volums.append(np.NaN)
plot2 = sns.barplot(x = inv.loc[x, "typeName"], y=volums, alpha=0.5, color='#0000FF')
plot2.set_xticklabels([])
plt.title(regi.loc[ind, "regionName"])
plt.xlabel("Modules")
plt.ylabel("Relative Frequency/Volume")
plot.set_xticklabels([])
plt.show()
The y axis represents the proportion of market volume for that product in the region, or the proportion of it seen in the killmail. The x axis is the item id. When we sort the module frequency in ascending order, the module volumes appear to be randomly scattered. But in general, we might see a very slight increase in the relative volume of the modules. However, it is much smaller than the increase in module frequency.
In addition, some regions appear to have the blue graph extend far above the red graph, while some regions have the blue graph almost completely contained within the red graph. This means some regions have higher average module usage, while some regions have higher average market volume. So the average usage of modules in the region does not neccessarily depend on the average market volume in that region. This also refutes the hypothesis that there is moderate correlation.
In conclusion, there is very little correlation between use of modules and their availability.
plt.figure(figsize=(30,20))
sns.scatterplot(x=np.log(fshipprice["ship_price"]), y=np.log(fshipprice["module_price"]), hue=fshipprice["group"],s =1)
plt.plot([10,22], [10,22], color="red")
plt.title("Graph of module prices against the price of the ship they are on")
plt.annotate("y=x", (10, 9.8))
plt.xlabel("Natural logarithm of ship price")
plt.ylabel("Natural logarithm of module price")
Text(0, 0.5, 'Natural logarithm of module price')
From the graph, the densest areas form a slight upward trend, with the exception of the purple group, which is freighters. The orange group (standard ships) forms the bulk of the left side of the graph, but the price of the modules fit on them do not increase that much as the ship increases. The blue points (precursor ships) start slightly higher than the green points (faction ships), but they both gradually trend to the same module prices
The advanced ships (red) are very rarely seen, and occupy a rather small range of values on the x axis, so any trend is hard to see.
The standard ships occupy a greater range of x values, and the price of the modules generally increases as the price of the ship increases. However the slops on the graph is very gradual, meaning the prices of modules do not increase as much as the prices of the ship they are on.
The precursor ships start slightly above the rest of the graph, meaning the prices of the modules fit on them are on average higher for the cheaper ships, but on the more expensive ships are, on average, around the same price.
Faction ships always appear to be slightly below the y=x line, meaning the modules fit on them are on average cheaper than the rest of the ship.
Freighters are an exception, with them being much more expensive than the modules usually fit on them. This could be because freighters are used to carry items and do almost no fighting. The ones appearing on the killmails could be shot down by a fleet of players seeking the cargo in the freighter. This means these freighters do little combat, and so there would not be a good reason to over-spend on modules that only marginally increase its power.
However, this graph is a logarithmic graph, so a slight deviation from the line means a multiplication or division by a not insignificant number. So any attempts to predict the prices will be off by several times.
plt.figure(figsize=(200,100))
sns.barplot(x = inv["typeName"][pnp.index.values], y=pnp["P"])
plt.title("Usage rates of modules in PvP scenarios")
plt.xlabel("Modules")
plt.ylabel("Usage rate")
Text(0, 0.5, 'Usage rate')
sns.kdeplot(data=pnp["P"])
plt.title("Density of modules over the PvP usage rate")
plt.xlabel("PvP usage rate")
Text(0.5, 0, 'PvP usage rate')
From the graphs, we notice that the PvP usage rate of modules or ships is approximately normally distributed, but with a slight left skew and a slightly flatter top end. This means that items are not as unlikely to have PvP usage rates that are slightly above or below 50%.
This means that the popularity of items is, on average, different against player or non-player enemies. This means that PvP or PvE fights are different enough that there are many items that would be preferred for one situation or the other.
So yes, the popularity of modules or ships does tend to be at least slightly different for PvP or PvE situations.
Additionally, from the top end, there are things that are only used in PvP situations.
plt.figure(figsize = (20,10))
plot = sns.barplot(x = inv["typeName"][pnp[pnp["P"] == 1].index.values], y=pnp[pnp["P"] == 1]["P"])
plot.set_xticklabels(plot.get_xticklabels(), rotation=30, ha="right")
plt.title("Modules or ships exclusively used in PvP")
plt.xlabel("Modules")
plt.ylabel("PvP Usage Rate")
plt.show()
Judging from the module names, these are modules that are fit on structures and capital ships, along with such structures and capital ships. This is likely because they are massive and impractical to use on raiding the generally more secluded bases of pirates and other non-player enemies. Additionally, structures are rooted, so it is impossible to use them to attack things that are not players, as they do not move out from their bases.
normalized = (grouppair2 / grouppair2.sum()).T
normalized.columns = marketgroups["marketGroupName"][pd.to_numeric(normalized.columns.values).tolist()]
normalized.index = marketgroups["marketGroupName"][pd.to_numeric(normalized.index.values).tolist()]
normalized.index.name = "row"
normalized.columns.name='column'
fig = px.imshow(np.log(np.log(normalized+1)+1), text_auto = True, width=600, height=500)
fig.update_layout(title_text = "Which module groups are commonly paired together?", yaxis_title = "", xaxis_title = "").update_yaxes(automargin=False).update_xaxes(automargin=False)
fig.show()
#plt.show()
The trend seems to be that modules are paired with their rigs, for example the scanning rigs are paired with scanner modules, harvesting equipment is paired with resource processing rigs, electronic warfare is paired with their rigs, etc. This makes sense as rigs have bonuses to that specific item. In addition, items of similar types seem to also be paired with each other, like structure engineering rigs, electronic warfare and energy neutralizers, scanning equipment with other scanning equipment, etc. The brighter points on the heatmap are the pairs of module groups that occur the most number of times relative to other pairs.
All the links have been stated under the dataset. The only other reference is EVE Online itself.
There are many additional notebooks that helped in the data processing and downloading, as well as early stage EDA.
https://drive.google.com/file/d/14tmkJL2JIrz3QHsyUNz2nCEzGWqODzwu/view?usp=sharing and here is all the rest of the raw data